sst5 support instead of sst2 #1

nikolawhallon · 2022-11-04T18:31:29Z

For sentiment, sst5 predicts 5 sentiment categories - "very positive", "positive", "neutral", "negative", and "very negative".

sst2 predicts 2 sentiment categories - "positive" and "negative".

rust-bert only supports sst2, not sst5, but we want to use sst5

This is a somewhat ugly, though not that bad imo hot-fix - the proper solution might be to work on getting rust-bert to support both, or rather to support reading a config file with a label -> sentiment mapping (although that would remove the ability to have the sentiment values be in an enum) - I don't see how to do that without making a breaking API change?

This also is added on top of a particular commit - I was not able to easily update our rust-bert to 0.19.

mlodato517 · 2022-11-11T19:14:50Z

Sorry - could you give more details here? 😅 What's "sst2"? What's "sst5"?

nikolawhallon · 2022-11-11T19:27:08Z

Sorry - could you give more details here? sweat_smile What's "sst2"? What's "sst5"?

I added some such info in the description.

mlodato517

Seems reasonable to me so far! Other implementations can be figured out if/when we contribute this upstream or if/when we change our production model.

mlodato517 · 2022-11-11T19:29:47Z

src/pipelines/sentiment.rs

@@ -141,8 +142,10 @@ impl SentimentModel {
        let labels = self.sequence_classification_model.predict(input);
        let mut sentiments = Vec::with_capacity(labels.len());
        for label in labels {
-            let polarity = if label.id == 1 {
+            let polarity = if label.id == 4 || label.id == 3 {


Okay so this basically gains us sst5 but loses us sst2 but that's fine because the current production model we're using is sst5. And this is why, as you mention in the PR description, it'd be great to get this into rust-bert as something configurable per model so the sentiment pipeline works with whatever IDs the underlying model will return?

Yes - and I think it would be a fun rust-bert contribution to try, but maybe tricky.

Yeah I wonder how these models work and if we could get "lucky" with a "specify the neutral range and anything above that is positive and anything below is negative". But it's technically more flexible to require the mapping as you say. Yeah interesting to try and figure out.

Well, the models often come with something like config.json which explicitly contains the mapping.

nikolawhallon · 2022-11-11T19:32:46Z

Seems reasonable to me so far! Other implementations can be figured out if/when we contribute this upstream or if/when we change our production model.

Thanks for double checking this - one question will remain, should this actually get merged? The PR is useful to get a review, but I imagine it's easier to keep this in a branch which we point to. Is there a canonical preference?

mlodato517 · 2022-11-11T19:37:20Z

should this actually get merged? The PR is useful to get a review, but I imagine it's easier to keep this in a branch which we point to. Is there a canonical preference?

Yeah that is a good question. I feel like normally the process is:

fork repo
make branch with fixes you need
make PR from your fork to upstream repo
reference your branch in your fork while you
update to use upstream when PR is merged

but here we're like 100% sure that the thing we PR to rust-bert will not be the same as this branch 🤔

Might be best to dismiss my approval and add commits to this branch to approach the thing we think we'd submit to rust-bert. Then, one day, we can squash all the commits and make a PR to them without disrupting our use of it?

I think we could also merge this and other PRs into our master and then one day make a PR from our master to their master that, in sum, has "the right change".

nikolawhallon added 2 commits November 4, 2022 11:29

sst5 support instead of sst2

a9aa00f

fixed off-by-one sentiment

cd81a83

mlodato517 approved these changes Nov 11, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sst5 support instead of sst2 #1

sst5 support instead of sst2 #1

nikolawhallon commented Nov 4, 2022 •

edited

Loading

mlodato517 commented Nov 11, 2022

nikolawhallon commented Nov 11, 2022

mlodato517 left a comment

mlodato517 Nov 11, 2022

nikolawhallon Nov 11, 2022

mlodato517 Nov 11, 2022

nikolawhallon Nov 11, 2022

nikolawhallon commented Nov 11, 2022

mlodato517 commented Nov 11, 2022

sst5 support instead of sst2 #1

Are you sure you want to change the base?

sst5 support instead of sst2 #1

Conversation

nikolawhallon commented Nov 4, 2022 • edited Loading

mlodato517 commented Nov 11, 2022

nikolawhallon commented Nov 11, 2022

mlodato517 left a comment

Choose a reason for hiding this comment

mlodato517 Nov 11, 2022

Choose a reason for hiding this comment

nikolawhallon Nov 11, 2022

Choose a reason for hiding this comment

mlodato517 Nov 11, 2022

Choose a reason for hiding this comment

nikolawhallon Nov 11, 2022

Choose a reason for hiding this comment

nikolawhallon commented Nov 11, 2022

mlodato517 commented Nov 11, 2022

nikolawhallon commented Nov 4, 2022 •

edited

Loading